A Hough Transform based Technique for Text Segmentation

نویسندگان

  • Satadal Saha
  • Subhadip Basu
  • Mita Nasipuri
  • Dipak Kumar Basu
چکیده

Text segmentation is an inherent part of an OCR system irrespective of the domain of application of it. The OCR system contains a segmentation module where the text lines, words and ultimately the characters must be segmented properly for its successful recognition. The present work implements a Hough transform based technique for line and word segmentation from digitized images. The proposed technique is applied not only on the document image dataset but also on dataset for business card reader system and license plate recognition system. For standardization of the performance of the system the technique is also applied on public domain dataset published in the website by CMATER, Jadavpur University. The document images consist of multi-script printed and hand written text lines with variety in script and line spacing in single document image. The technique performs quite satisfactorily when applied on mobile camera captured business card images with low resolution. The usefulness of the technique is verified by applying it in a commercial project for localization of license plate of vehicles from surveillance camera images by the process of segmentation itself. The accuracy of the technique for word segmentation, as verified experimentally, is 85.7% for document images, 94.6% for business card images and 88% for surveillance camera images.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Line And Word Segmentation of Handwritten Documents

In this paper, we present a segmentation methodology of a handwritten document in its distinct entities namely text lines and words. Text line segmentation is achieved making use of the Hough Transform on a subset of the connected components of the document image. Also, a post-processing step includes the correction of possible false alarms, the creation of text lines that Hough Transform faile...

متن کامل

Hough Transform-based Technique for Automated Carbon Nanocone Segmentation

A new technique to automatically segment the L-shaped carbon nanocone structures from Transmission Electronic Microscopy (TEM) images is described. The technique enables robust segmentation of the structures by exploiting a simplified Generalized Hough Transform (HT)-based processing. Exploitation of parallelism on commodity hardware is also explored for efficient processing. Effectiveness of t...

متن کامل

Text line and word segmentation of handwritten documents

In this paper, we present a segmentation methodology of handwritten documents in their distinct entities, namely, text lines and words. Text line segmentation is achieved by applying Hough transform on a subset of the document image connected components. A post-processing step includes the correction of possible false alarms, the detection of text lines that Hough transform failed to create and...

متن کامل

Iris localization by means of adaptive thresholding and Circular Hough Transform

In this paper, a new iris localization method for mobile devices is presented. Our system uses both intensity and saturation threshold on the captured eye images to determine iris boundary and sclera area, respectively. Estimated iris boundary pixels which have been placed outside the sclera will be removed. The remaining pixels are mainly the boundary of iris inside the sclera. Then, circular ...

متن کامل

Range Image Segmentation Using Randomized Hough Transform

Range image analysis is one of the most important subjects in the fields of computer vision and pattern recognition, and range image segmentation is the key of the analysis. This paper presents a range image segmentation technique based on RHT (Randomized Hough Transform). The proposed technique has the advantage of insensitivity to noise. Experiments were performed in a popular range image dat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1002.4048  شماره 

صفحات  -

تاریخ انتشار 2010